Learning Shaping Rewards in Model-based Reinforcement Learning

نویسندگان

Marek Grzes

Daniel Kudenko

چکیده

Potential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains how to compute the potential which is used to shape the reward that is given to the learning agent. In this paper, we show how, in the absence of knowledge to define potential manually, the potential function can be learned online in parallel with the actual reinforcement learning process. The approach for the prototypical model-based R-max algorithm is proposed. It learns the potential function using the free space assumption about the transitions in the environment. The novel algorithm is presented and evaluated empirically and theoretically. Specifically, the proposed algorithm is shown to learn an admissible potential which is required by the R-max algorithm with potential-based reward shaping.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online learning of shaping rewards in reinforcement learning

متن کامل

Potential-Based Shaping and Q-Value Initialization are Equivalent

Shaping has proven to be a powerful but precarious means of improving reinforcement learning performance. Ng, Harada, and Russell (1999) proposed the potential-based shaping algorithm for adding shaping rewards in a way that guarantees the learner will learn optimal behavior. In this note, we prove certain similarities between this shaping algorithm and the initialization step required for seve...

متن کامل

Reward Shaping in Episodic Reinforcement Learning

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of reinforcement learning in various sectors, such as healthcare and cyber-security, among others. However, reinforcement learning can be time-consuming be...

متن کامل

Imitation in Reinforcement Learning

The promise of imitation is to facilitate learning by allowing the learner to observe a teacher in action. Ideally this will lead to faster learning when the expert knows an optimal policy. Imitating a suboptimal teacher may slow learning, but it should not prevent the student from surpassing the teacher’s performance in the long run. Several researchers have looked at imitation in the context ...

متن کامل

Potential-based difference rewards for multiagent reinforcement learning

Difference rewards and potential-based reward shaping can both significantly improve the joint policy learnt by multiple reinforcement learning agents acting simultaneously in the same environment. Difference rewards capture an agent’s contribution to the system’s performance. Potential-based reward shaping has been proven to not alter the Nash equilibria of the system but requires domain-speci...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Learning Shaping Rewards in Model-based Reinforcement Learning

نویسندگان

چکیده

منابع مشابه

Online learning of shaping rewards in reinforcement learning

Potential-Based Shaping and Q-Value Initialization are Equivalent

Reward Shaping in Episodic Reinforcement Learning

Imitation in Reinforcement Learning

Potential-based difference rewards for multiagent reinforcement learning

عنوان ژورنال:

اشتراک گذاری